Contents

The notebook is structured as follows:

  1. Introduction
  1. Exploration of the dataset
  1. Pre-processing
  1. Modeling
  1. Evaluation
  1. Summary and prospects

1. Introduction

1.1. Problem statement

Deliverables:

1.2. Proposed solution

The problem is to find a function $G$ to predict tool-tip forces from position coordinates: $$ \vec{f} = G(x,y,z,a,b,c) $$ Models like these could be solved using a parametrized dynamical model with Lagrange or Newton-Euler methods assuming rigid body motion. However, instead of doing that, I'm going to approximate $G$ with a neural network.

Here are a few sources I looked at for inspiration:

2. Exploration of the dataset

Before training a model, it is always good to look at the data. Let's have a look at time series plots, histograms, and correlations.

Import the datasets into pd.DataFrame

Look at a summary of what's in the dataframe

Make separate lists of input features (position coordinates and euler angles) and outputs (forces)

2.1. Correlations

First look at correlations for R1 and R2 separately, for simplicity

seaborn.pairplot can be helpful to get an idea of correlations

Plot correlation matrices

Observations:

Now let's look at correlations between R1 and R2

Observations:

2.2. Time series plots

Plot varables in dataset to see what things look like

2.3. Histograms

Make histograms of varables in dataset

3. Pre-processing

3.1. Addition of higher-order derivatives of input features

I noticed that the models could more accurately predict forces from motion in datasets not seen during training when adding higher-order derivatives of the position coordinates as input features.

First (velocity) and second (acceleration) order derivatives had a notable effect on the performance (more about that in section 4). I also tried to include up to 6th order derivatives (3rd=jerk, 4th=snap, 5th=crackle, 6th=pop) [1]. These had a smaller impact but did help in some cases.

[1] https://en.wikipedia.org/wiki/Fourth,_fifth,_and_sixth_derivatives_of_position

Create feature lists

Correlations to higher order derivatives

$x$-axis only

All coordinates

3.2. Scaling of input features and outputs

Specify which datasets to use

Hold out 'Test2' for testing of model trained on data from 'Test1' and 'Test4'.

Split dataframe into X (features) and Y (outputs)

Feature scaling

Both MinMaxScaler and StandardScaler were tried, the former gave slightly more robust performance.

3.3. Preparing input features for RNN training

Including multiple time steps of the input features and training an RNN to predict forces improved the performance over using a DNN. After some experimentation, I concluded that about 20 timesteps worked pretty well.

3.4. Splitting of data into train/validation/test sets

Create train/val/test sets with 12 (positions and angles) and 36 input features (positions, angles, and up to 2nd order derivatives)

4. Modeling

Finally, we're getting to the exciting part of training some neural nets.

4.1. Linear Regression

Let's start with a simple linear regression.

Predict $f_{x_1}$ from $x_1$

Plot loss vs. epoch

Plot predictions vs. true values

Plot $f_{x_1}$ vs. $x_1$ (scaled)

Predict all 6 forces from 12 input features

Plot loss vs. epoch

Plot predictions vs. true values

4.2. DNN regression

I experimented by varying the number of layers, dropout rate, learning rate, batch size, and adding batch normalization layers—the model below achieves pretty good performance.

Predict all 6 forces from 12 input features

Save DNN model

Plot loss vs. epoch

Compare prediction vs. true values for the test set

Predict all 6 forces from 36 input features (12+2x12)

Up to 2nd order derivatives for 12 input features.

Save DNN model

Plot loss vs. epoch

Compare prediction vs. true values for the test set

Predict all 6 forces from 84 input features (12+6x12)

Up to 6th order derivatives for 12 input features.

Save DNN model

Plot loss vs. epoch

Compare prediction vs. true values for the test set

4.3 RNN regression

I experimented with LSTM, GRU, and SimpleRNN layers. Performance was relatively similar, but the LSTM seemed to do slightly better.

As is shown in section 5, an RNN did a significantly better job predicted unseen data than a DNN, especially for the dataset not seen during training.

Save RNN model

Plot loss vs. epoch

Compare prediction vs. true values for the test set

5. Evaluation

5.1 Loss on test sets

Linear models:

Loss after 1000 and 300 epochs for DNN and RNN respectively:

DNN:

RNN:

Loss on Test1, Test2, Test4 (no data from Test2 was included in the training)

>> Note that no part of Test2 was included during training <<

Observations:

5.2. Prediction error

Model predictions on the different test sets

Observations:

R2 score

The coefficient of determination (R2) is the proportion of the variation in the dependent variable (e.g., predicted forces) that is predictable from the independent variables (e.g., measured forces). The range is from negative infinity to +1.

Observations:

Pearson corrrelation coefficient

The Pearson correlation coefficient ($r$) can range from -1 to +1.

Observations:

5.3. Time series plots of prediction vs. ground truth

6. Summary and prospects

Six neural networks (three DNNs and three RNNs) were trained to predict tool-tip forces from input positions and angles (and their higher-order derivatives). The models were trained on 70% of the combined Test1 and Test4 datasets. Test2 was left out of the training entirely to evaluate the performance of the models on unseen runs of the robots.

The RNN models clearly outperformed the DNN models. This is especially clear from looking at the predicted tool-tip forces as a function of time for the Test2 dataset (unseen during training) above.

Adding the first-order (velocity) and second-order (acceleration) derivatives of the positions and angles as additional input features reduced the loss and significantly improved performance. Adding up to 6th order derivatives seemed helpful in some earlier DNN models (with fewer layers) that I trained (not shown in this notebook). However, the performance comparisons between DNN-84 and DNN-36 in this notebook indicate that the impact is marginal.

Although the RNN did a decent job, all models struggled in generalizing to Test2. A more comprehensive hyperparameter optimization could be done to improve the performance. Adding training data from more runs with robots would probably help a lot.

You would probably want to further optimize and tailor the model to the application. A deeper (and recurrent) model will likely perform better if the goal is to achieve optimal accuracy. However, if the goal is to run fast inference on resource-constrained hardware, you'd want to optimize a smaller model that is good enough for the job.